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Abstract — We present an upper bound to the reliability func- 
tion of a Discrete Memoryless Channel based on a combination 
of Lovasz's and Gallager's ideas. In particular, we introduce a 
function §(p) that varies from the cut-off rate of a channel to 
the Lovazs theta function as p varies from 1 to oo. The obtained 
bound to the reliability, though loose in its present form, is finite 
for all rates larger than the Lovasz theta function and shows 
interesting connections with Gallager's expurgated coefficient. 

I. Introduction 

One of the most intriguing topic in coding theory is the 
problem of bounding the probability of error of optimal codes 
at low rates. While at high rates the asymptotic behaviour 
of the probability of error for optimal codes is now very 
well understood, very little is known in the low rate region. 
Shannon [1| introduced the notion of channel capacity C, 
which represents the largest rate at which information can be 
sent through the channel with probability of error that vanishes 
with increasing block-length. He then also introduced [2| the 
notion of zero-error capacity Co as the largest rate at which 
information can be sent with probability of error precisely 
equal to zero. For rates in the range Co < R < C, the 
probability of error is known to decrease exponentially in the 
block-length n as 

P e ^e- nE(R \ (1) 

where E(R) is the so called reliability function of the channel. 
Both determining E(R) for small R and even determining 
Co is an unsolved problem and only upper and lower bounds 
for these quantities are known. Lovasz gave an important 
contribution in upper bounding Co, thus enlarging the range 
of values over which E(R) is known to be finite. However, 
Lovasz's result was never exploited to find actual bounds to 
E(R) in the region of rates immediately above his bound. 

In this paper, we propose an upper bound to the reliability 
function E(R) of a discrete memoryless channel that can be 
interpreted as a combination of Lovasz's and Gallager's ideas 
in the direction of giving at least a crude upper bound to 
E(R) for those rates that Lovasz's own result proves to be 
strictly larger than the zero-error capacity. The intent, however, 
is to obtain Lovasz's bound as a consequence of an upper 
bound on E(R) and not viceversa. The obtained bound to 
E(R) is loose in general, but it has two important merits. 
First, it reveals an interesting analogy between the Lovasz 
theta function, the cut-off rate, and the expurgated coefficient 
of Gallager J9j- We will in fact introduce a function ■d(p) 



that varies from the cutoff rate of the channel, when p = 1, 
to the Lovasz theta function, when p — > oo. The idea is to 
keep Lovasz' result in mind as a target but building it as the 
limit of a smoother construction. This construction is related 
to that of the expurgated bound, but it is used precisely in 
the opposite direction. We believe these analogy could shed 
new light to the understanding of the topic and deserve further 
study. Second, this bound can be generalized, in the context 
of quantum information theory, to bound the reliability of a 
general channel by means of the sphere-packing bound applied 
to classical-quantum auxiliary channels. This second aspect is 
only mentioned here and cannot be analyzed in detail due to 
space limitation. 

II. Basic notions 

A. Reliability of DMC's 

Let W(x\y), x £ X, y E y, be the transition probabilities 
of a discrete memoryless channel W : X — > y, where X = 
{1,2,..., if} and y = {1,2,..., J} are finite sets. For a 
sequence x — {x\, x%, . . . , x n ) £ X n and a sequence y = 
(yii V2, ■ ■ ■ ,Vn) € y n , the probability of observing y at the 
output of the channel given x at the input is 



N 



W^>(y\x)=Y[W{y n \x n ). 



(2) 



A block code with M messages and block-length n is a 
mapping from a set {1, 2, ... , M} of M messages onto a set 
{xi, X2, . . • , xm} of M sequences in X n . The rate R of the 
code is defined as R = log M/n. A decoder is a mapping 
from y n into the set of possible messages {1,2,..., M}. If 
message m is to be sent, the encoder transmits the codeword 
x m through the channel. An output sequence y is received by 
the decoder, which maps it to a message m. An error occurs 
if m ^ m. 

Let Y m C y n be the set of output sequences that are 
mapped into message m. When message m is sent, the 
probability of error is 
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W^(y\x r: 



(3) 



The maximum error probability of the code is defined as the 
largest P e \ m , that is, 



max P P 



(4) 



Let Pghx(R) be the minimum maximum error probability 
among all codes of length n and rate at least R. Shannon's 
theorem [1] states that sequences of codes exists such that 
Pe,max(-R) — > as n — > oo for all rates smaller than a constant 
C, called channel capacity. For R < C, Shannon's theorem 
only asserts that P e ,max(-R) — > as n — >• oo. For a range of 
rates Co < R < C, the optimal probability of error P ej max(-R) 
is known to have an exponential decrease in n, and it is thus 
useful to define the reliability function of the channel as 

E{R) = lim sup - - log P e M ax (i?) . (5) 

n— >oo 

The value Co is the so called zero-error capacity, also intro- 
duced by Shannon [2|, which is defined as the highest rate 
at which communication is possible with probability of error 
precisely equal to zero. More formally, 

C = sup{i? : Pi% x {R) = for some n}. (6) 

For R < Cq, we may define the reliability function E(R) has 
being infinite. Note that Co > if and only if there are at least 
two input symbols x and x' which are not confusable at the 
output, meaning that W(y\x)W(y\x') is positive for at least 
one value of y. Determining the reliability function E(R) (at 
low positive rates) and the zero-error capacity Co of a general 
channel is still an unsolved problem. 

One of the most famous results in this direction is Lovaasz's 
upper bound to Co- Lovasz proves that Co is upper bounded 
by a quantity i3 defined as 

$ = min min max log — t 

c x \ulc\ 2 

where {u x } x€ x runs over all sets of unit norm vectors in any 
Hilbert space such that u x and u x > are orthogonal if symbols 
x and x' are not confusable, c runs over all unit norm vectors, 
and -t denotes conjugate transpose. Here, u x ■ c is the scalar 
product. 

B. Bhattacharyya distances and scalar products 

Here, we briefly some important connections between the 
reliability function E(R) and the Bhattacharyya distance be- 
tween codewords. This connection is of great importance since 
the Bhattacharyya distance between distributions is related to 
a scalar product between unit norm vectors in a Hilbert space. 
It is this property that creates an underlying common sub- 
strate for Lovasz's approach and for bounding the reliability 
function. 

For a generic input symbol x, consider the unit norm \y\- 
dimensional column vector tp x with components ip x (y) = 
y/W(y\x). We call this the state vector of input symbol 
x, in analogy with the input signals of pure-state classical- 
quantum channels. In the same way, for an input se- 
quence x = (xi,X2, ■ ■ ■ ,x n ), consider the unit norm \y\ n - 
dimensional column vector ty x whose components are the 
values y/W( n \y\x), that is, is simply the element-wise 
square root of the conditional output distribution given the 



input sequence x. Then, since the channel is memoryless, we 
can write 

** = Vxx ® ipx 2 ® • • • (7) 

where ® is the Kronecker product. Let for ease of notation *& m 
be the state vector of the codeword x m ; then we can represent 
our code {xi, x%, . . . , Xm } by means of their associated 
state vectors 1 4 , 2, • ■ • , *&m}- Since all square roots are 
taken positive, note that our channel has a positive zero-error 
capacity if and only if there are at least two state vectors ip x , 
x)j x i such that ip x ip x > = 0. This implies that codes can be 
built such that \E , J Tl \I/ TO / = for some to, m! , that is, the 
two codewords m and m! cannot be confused at the output. 
However, the scalar product \&j Tl \I , m / plays a more general 
role since it is related to the so called Bhattacharyya distance 
between the two codewords m and m'. In particular, in a 
binary hypothesis testing between codeword m and to', an 
extension of the Chernoff Bound allows to assert that the 
minimum error probability asymptotically satisfies fl3] 

P e = min y^W^iylxmY-'W^ivlx^y (8) 

0<s<l * — * 

y 

where = means equivalence to the first order in the exponent. 
For s = 1/2, the sum above obviously equals \&J„ 1 4' m /. It 
is easily shown that the minimum above is always between 
(^f^^/m') 2 and \&] n \I> m ', and it equals the latter for a class of 
channels, called pairwise reversible channels, that have some 
symmetry with respect to the input symbols^ 0. Obviously, 
for a given code, the probability of error P e ,max is lower 
bounded by the probability of error in each binary hypothesis 
test between two codewords. Hence, we find that P e ,max 
asymptotically satisfies 

1 2 

- - logP e!max < log max (*L*mO + o(n), (9) 

Tl Tl m=tm' 

where the coefficient 2 can be removed if the channel is 
pairwise reversible. It is thus obvious that it is possible to 
upper bound E(R) by lower bounding the quantity 

7 = max (10) 

Lovasz's work aims at finding a value $ as small as possible 
that allows to conclude that, for a set of M — e nR > e n ^ 
codewords, 7 cannot be zero, and thus at least two codewords 
are confusable. Here, instead, we want something more, that 
is, finding a lower bound on 7 for each code with rate R > i3 
so as to deduce an upper bound to E(R) for all R > 

III. An "umbrella" bound 

Consider the scalar products between the channel state 
vectors V'aVv > 0. For a fixed p > 1, consider then a set 
of "tilted" state vectors, that is, unit norm vectors {ip x } in 
any Hilbert space such that < (■0jVx') 1/p - We call 

such a set of vectors {tpk} an orthonormal representation of 
degree p of our channel, and call T(p) the set of all possible 

'Pairwise reversible channels are those for which, somehow tautologically, 
the minimum is achieved for s = 1/2 



such representations 

r( P ) = {«> fc } : \ftj x ,\ < W4</v) 1/p } . (ii) 

Observe that T(p) is non-empty since the original ipk vectors 
satisfy the constraints. The value of an orthonormal represen- 
tation is the quantity 

Vd^k}) = minmaxlog -} (12) 

f x mf\ 2 

where the minimum is over all unit norm vectors /. The 
optimal choice of the vector / is called, with Lovasz, the 
handle of the representation. We call it / to point out that this 
vector plays essentially the same role as the auxiliary output 
distribution f used in the sphere -packing bound of [4|. Due to 
space limitation, we cannot discuss this detail here, see Q. 

Call now 'ff(p) the minimum value over all representations 
of degree p, 

d{p)= min V({ip k }) (13) 

{V> fc }er( P ) 

We have the following result. 

Theorem 1: For any code of block-length N with M code- 
words and any p > 1 we have 

^ , + , (Me- W (") - 1)" 

Corollary 1: For the reliability function of a general DMC 
we have the bound 



E(R) < 2p0(p), R > 0(p), 



(15) 



Proof: For an input sequence x = (x\, X2, ■ ■ ■ , x n ) call, 
in analogy with (0, ty x = xp Xl ® i)) X2 ® • • ■ ip Xn . Observe first 
that, for any two input sequences x and x', we have 



i*£*«'i=ni$A 



Ip 



(16) 



(17) 



(18) 



Furthermore, note that, for an optimal representation of degree 
p with handle /, we have |i/>J/| 2 > e~^( p \ Vx. Set now 
F = / 8JV . We then have 



i=l 



(19) 



e (20) 

Let us first check how Lovasz's bound is obtained. Lovasz's 
approach is to bound the number M of codewords with orthog- 
onal state vectors, using the property that if i , \&2, • ■ • ^ m 
form an orthonormal set, then 

2 



1 = 



m 



(21) 
(22) 

(23) 



Hence, if M > e n ^ p \ there are at least two non-orthogonal 
vectors in the set, say |^'j Tl * m /| 2 > 0. But this implies that 
(*L*m0 2 > l*L*m'| 2p > . Hence, if R > 0(p), no 
zero-error code can exist. We still have the freedom in the 
choice of p and it is obvious that larger values of p can only 
give better results. Hence, it is preferable to simply work 
in the limit of p — » oo and thus build the representation 
■01, ifa, . . . , tjjjc under the only constraint that = 

whenever \ipjipj\ = 0. This gives precisely Lovasz' result. 

Now, instead of bounding R under the hypothesis of zero- 
error communication, we want to bound the probability of 
error for a given R > &(p). Considering the tilted state vectors 
of the code, we can rewrite equation (l20l as 

(24) 
(25) 



\& m F\ 2 = F*(9 m 9l)F 



> 



-ni?(p) 



The second expression above has the benefit of easily allowing 
averaging this expression over different codewords. So, we can 
average this expression over all m and, defining the matrix 
*=(*!,..., * M ) /VM, we get 

irt*$t_F > e -rn>(p)_ (26) 

Since F is a unit norm vector, this implies that the matrix 4>$^ 
has at least one eigenvalue larger than or equal to e~ n ^ p > . 
This in turn implies that also the matrix €>t<I> has itself an 
eigenvalue larger than or equal to e~ n ^ p >, that is 



K 



* T * > e 



— m?(p) 



(27) 



It is known that for a given matrix A with elements {Aij}, 
the following inequality holds 



An 



{A) < max^ \Aij\ 



(28) 



Using this inequality with A = since Aij — 4?\i&j/M, 

we get 



< max 



3 



M 



We then deduce 



Me -n#(p) _ i x 



M — 



(29) 



(30) 



(31) 



-^ ax M^£(*N 1/P (32) 



< max 




(33) 



where the last step is due to the Jensen inequality, since 
p > 1. Extracting the sum from this inequality we obtain 
the inequality stated in the theorem. 



To prove the corollary, simply note that 

max *| *j > max — —„ - I *< *j I (34) 



> 



M — 1 



> f e -ni>(p) _ e -«fl 



(35) 
(36) 



The bound is trivial if i? < 0(p). If i? > i?(p), we 
deduce again Lovasz's result that there are two non-orthogonal 
codewords. But now we also have some further information; 
for R > #(p), the second term in the parenthesis decreases 
exponentially faster than the first, which leads us to the 
conclusion that 

- - log max < P0(j>) + o(l). (37) 

The bounds in terms of E(R) are then obtained by simply 
taking the limit N — >• oo and using the bound ((9j- ■ 

IV. Connections with other results in channel 

THEORY 

A. Relation to known classical quantities 

A first important comment abount r)(p) concerns the result 
obtained for p = 1; the value #(1) is in fact simply the cut-off 
rate of the channel. Indeed, for p = 1, we can without loss of 
generality use the obvious representation ip x = ijj x , \/x, since 
any different optimal representation will simply be a rotation 
of this (or an equivalent description in a space with a different 
dimension). In this case, all the components of all the vectors 
{tpx} are non-negative and this easily implies that the optimal 
/ can as well be chosen with non-negative components, since 
changing a supposedly negative component of / to its absolute 
value can only improve the result. Thus, / can be written as 
the square root of a probability distribution Q on y and we 
have 

$(1) = minmaxlog — \ (38) 

f * IV4/I 2 



pure-state classical-quantum channel with state vectors \ip x ) 
diverges (see Section HV-Bl below and |5) for details). 

Another important characteristic of the function $(p) is ob- 
served in the limit p — s- oo. In the limit, the only constraint on 
the representations is that Itjjjxpjl = whenever IV'JV'j'l = 0- 
Hence, when p — > oo, the set of possible representations is 
precisely the same considered by Lovasz Q, and we thus 
have r)(p) — > ■§ as p — > oo. So, the value of $(p) moves 
from the cut-off rate i?i to the Lovasz bound r) when p varies 
from 1 to oo. This clearly implies that our bound to E(R) is 
finite for all R > r) and thus it allows to bound the zero-error 
capacity of the channel as 



Co 



< lim tf(p) 

p— ¥OC 

= ■d. 



(40) 
(41) 



= minmax y~ 2lo &}2 VW(y\x)Q(y) j (39) 

where the minimum is now over all probability distributions Q. 
As observed by Csiszar [6, Proposition 1, with a = 1/2], this 
expression equals the cut-off rate R\ of the channel defined 

as 

R 1 = max - log ^ P(x)P(x') ( ]T y/W(y\x)W(y\x') J 

x,x' \ y / 

= max — log ^ P(x)P(x')tplip x > 

The identity #(1) = i?i is certainly interesting. We point 
out, even if we cannot discuss it here, that it is related to 
an equivalence between the cut-off rate of a classical channel 
and the rate i?^ at which the sphere-packing bound for the 



In order to understand what happens for finite p > 1, it is 
instructive to consider first a class of channels introduced by 
Jelinek [8|. These are channels for which the matrix C with 
element Cjj = (■0jV'j) 1 ^ p is positive semidefinite for 
all p > 1. It was proved by Jelinek that, for these channels, 
the expurgated bound of Gallager [9| is invariant over n-fold 
extensions of the channel, that is, it has the same form when 
computed on a single channel use or on multiple channel 
uses (this is not true in general). Thus, if the conjecture 
made in |4] pag. 77], that the expurgated bound computed 
on the n-fold channel is tight asymptotically when n —> oo, 
is true, then for these channels the reliability would be known 
exactly since it equals the expurgated bound for the single use 
channel. It is also known that for these channels, the inputs 
can be partitioned in subsets such that all pairs of symbols 
from the same subset are confusable and no pair of symbols 
from different subsets are confusable. The zero error capacity 
in this case is simply the logarithm of the number of such 
subsets. For these channels, since the matrix C is positive 
semidefinite, there exists a set of vectors ipi, ip2, . . . , ipx such 
that ipjipj = Cij, that is, for all p > 1, representations of 
degree p exists that satisfy all the constraints with equality. In 
this case, the equivalence with the cut-off rate that we have 
seen for p = 1 can be in a sense extended to other p values. 
It can be proved J5] Th. 9] that we can write d(p) as 



1 



min max log ■ 

= max — log P{x)P{x') , i^] c 'i^ x > 

x,x' 

= max-log^P(izi)P(2;')W4Vv) 1/p 

= max - log P{x)P{x') f ^W(y\x)W(y\x') 



(42) 
(43) 

(44) 

i/p 

(45) 



Hence, under such circumstances, we find that d(p) = 
E x (p)/p, where E x (p) is the value of the coefficient used 
in the expurgated bound of Gallager [9 |. Note that, for each 



p, this bound is a straight line which intercepts the axis R and 
E at the points E x (p) / p and E x (p) respectively, which equal 
■d(p) and pfl(p). Hence, if the channel is pairwise reversible, 
then our bound is obtained by drawing the curve parameterized 
as (E x (p) I p, E. x (p)) in the (R,E) plane. This automatically 
implies that we obtain the bound 



Co < lim ^M. 

p^oo p 



(46) 



which gives the precise value of the zero-error capacity in this 
case (which is however trivial) and, if Co = 0, the bound 



£7(0) < lim 2ptf(p) 
= lim 2E x (p) 
- 2E ex (0). 



(47) 
(48) 
(49) 



If the channel is pairwise reversible, this can then be improved 
to £7(i?) < E ex (0), which is obviously tight. 

For general channels with a non-trivial zero-error capacity, 
like for example a channel whose confusability graph is a 
pentagon, what happens is that the matrix C is in general 
positive semidefinite only for not too large values of p and 
then it becomes not positive semidefinite for p large enough. 
This implies that representations that satisfy the constraints 
with equality exist only for p not too large. For larger p, the 
two expressions in equations (l43l l and (|44| | are no more equal 
and in general they could both differ from i?(p). If all the 
values tpj.il'i are nonnegative^, however, then it can be proved 
that the expression in d43l > equals d{p) [5„ Th. 9]. In this case, 
we see the interesting difference between i?(p) and E x (p)/p. 
The two quantities follow respectively d43l > and d44l >: when 
p — > oo, the first one tends to an upper bound to Co, 
while the second one tends to the independence number of 
the confusability graph of the channel, a lower bound to Co 
(this is the came value at which the expurgated bound goes to 
infinity, see iflOl ). 

Is is worth pointing out that, for some channels (for example 
all channels whose confusability graph is a pentagon), the 
optimal representation may even stay fixed for p larger than 
some given finite value p max . In this case, the bound is useless 
for p > p max . 

A final comment is about the computation of this bound. 
There is no essential difference with respect to the evaluation 
of the Lovasz theta function. The optimal representation {ip x } 
for any fixed p, can be obtained by solving a semidefinite 
optimization problem. If we consider the (K + 1) x (K + 1) 
Gram matrix 

G= $ x ,...,>4> K ,f} T [j>x,...,i> K J] (50) 

we note that finding the optimal representation amounts to 

2 We conjecture that the optimal representation, in terms of Lovasz's 
definition of value, always satisfies this condition. We have not yet investigated 
this aspect, but have never found a counterexample. 



solving the problem 

max V 

s.t. G(k,K + l) > V, Mk<K 

G(k,k) = 1, Vfc 

G(k,i) < ip^l" (51) 
l<k<K, k <i < K 

G is positive semidefinite 

The solution to this problem gives the value for the optimal 
representation and both the representation vectors {ip x } and 
the handle / can be obtained by means of the spectral 
decomposition of the optimal G found. 

B. Relation to classical-quantum channels 

The bound of Section [HI] can be interpreted as a simple 
variation of Lovasz's argument toward bounding E(R). It can 
be proved that Lovasz's idea is intimately related to the sphere- 
packing bound H, this connection being revealed in the 
context of quantum information theory [5 1. The bound to E(R) 
derived here is a special case of a more general bound that can 
be derived by properly applying the sphere-packing bound for 
classical-quantum channels ITT1 . 0. In particular, while the 
construction of the representation {ip x } was introduced here 
as a purely mathematical trick to bound £7(i?) by means of a 
geometrical representation of the channel, this procedure can 
be interpreted in the context of classical-quantum channels 
as a natural way to bound £7(i?) by comparing the original 
channel with an auxiliary one. The simplest case is actually 
obtained by taking an auxiliary pure-state channel with state 
vectors \tp x ). 

In the classical case, Lovasz's result came completely 
unexpected since it involves the unconventional idea of using 
vectors with negative components to play the same role of 
y/W{y\x). When formulated in the classical-quantum setting, 
however, this approach becomes completely transparent and 
does not require pushing imagination out of the original 
domain. 
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